Lab 5 - Wide and Deep Networks

By. Liam Lowsley-Williams

Introduction

In this lab, I will be building out a wide and deep neural network that will classify the edibility of a mushroom as poisonous or edible. I am using the UCI Machine Learning Mushroom data set which contains a grand total of 8,124 instances. After building out 3 wide and deep networks I will then compare the performance of them and how the best one compares to a multilayer perceptron.

The data set consists of several thousand mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms. It includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.

Link To Dataset - https://www.kaggle.com/uciml/mushroom-classification

Business Understanding

Motivations

It's a temperate Sunday afternoon and you are about to set out on your weekly walk through the forest to unwind and enjoy the great outdoors. You grab your trusty walking stick and a light jacket and head out to begin your journey. An hour passes and as you walking you suddenly realize you forgot to pick up those mushrooms you needed to make your famous mushroom casserole for dinner. However, lone behold, right in front of you is a patch of some beautiful mushrooms. You grab a couple, head back home, and make your casserole. However, what you didn't know was those mushrooms were poisonous and luckily you make it to the ER before its too late.

How amazing would it have been if all of that could have been avoided with a simple and easy to use app? Not only would you have not ended up in the hospital but you could have thoroughly enjoyed your mushroom casserole! Mushroom Hunting is a real activity and tens to hundreds of thousands of people partake. It is most popular in Europe, Australia, Japan, but most importantly, temperate regions of the United States and Canada. The data set is collected pertains to mushrooms on the North American continent, thus we have a target audience and potential consumers. [1]

In addition to the above concern, determining whether a mushroom is poisonous or not could be immensely useful in finding out if what your pet or child just ate could prove to be fatally harmful to them. Using an app to quickly check the mushroom and determine its edibility could be used to help indicate whether immediate action is required or not.

This is where the motivation stems from. I wish to use this mushroom data to construct a neural network that specializes in predicting whether a mushroom is poisonous based on its features.

Objectives

As stated above, the motivation behind using this data set was to construct a neural network capable of sufficiently predicting the edibility of a mushroom given its characteristics. That means we are looking for an extremely well-performing network that could then be loaded into an app that would aid users in finding out if that mushroom they just saw on their walk is in fact poisonous.

With the increasing adoption of eating natural and healthy, this app would make a beneficial contribution to that trend. People nowadays want to eat things that are grown naturally and free of the scary chemicals some corporations pump into their foods to save money. Just like how we have for the past several centuries. The increasing concern that these processed and artificial foods are actually detrimental to our health is helping a lot of people overcome prior medical issues being caused by these foods. I for one can say myself, after quitting drinking soda, I feel amazing and can't believe I drank soda for all those years.

Overall, our primary objective here is to classify the test data set correctly given our metric for evaluation stated below.

Feature Crossing

In this lab, I plan to perform the following feature crossings for the wide portion of the network. I decided to choose the following crosses as I felt that they made sense and were logical to combine...

For architecture 1 and 2 I chose to cross the following features...

  • stalk-color-below-ring & stalk-color-above-ring
  • gill-color & veil-color
  • odor & habitat

The reason I chose to cross these features is because they seem like fairly easily identifiable characteristics for other humans. The stalk colors on a given mushroom are most likely fairly close to one another so I decided to combine these features. The same can be said about the gill and veil colors, these are easily identifiable and most likely fairly consistent according to class. Lastly, I crossed odor and habitat as I saw it likely that a particular habitat will go hand in hand with a particular odor, or at least most of the time.

For architecture 3, I chose to cross the following features...

  • population, habitat, odor
  • habitat, bruises

The reason I chose to cross the features is because, like above, they seem fairly similar to one another. The population, habitat, and odor features are crossed as they seem like more meta-related information about the mushroom than actually describing it and could be useful. The habitat and bruises are crossed as well because the condition of the mushroom and whether it has bruises is primarily attributed to its environment, hence, habitat.

Evaluation

For this particular objective, I am classifying whether a mushroom is poisonous or not based on its features so a user could determine if their own mushroom is safe for consumption. Thus, I am very concerned with predicting false negatives, which would mean that I predicted a mushroom as edible when it was in fact poisonous. The reason behind this is quite self-explanatory, should a user eat a poisonous mushroom it could prove to be fatal. Whenever we are dealing with humans and fatality could be involved, our metric is extremely important in determining if a model is good or bad. We do not want a model that will occasionally predict false negatives. Therefore, given the objective, our metric for evaluation will be recall. I chose to use recall due to the fact that it is used to determine how well a model avoids false negatives. Thus we are looking for a recall rate of 100% or extremely close to it so we are never predicting if a mushroom is edible when it is actually poisonous.

$ Recall = \frac{TP}{TP + FN} $

Data Preperation

Below, I begin by preparing the data to be used by the model later on for training/testing. For this particular lab, I chose to use the UCI Mushroom Data Set and predict whether a given mushroom is poisonous or edible based on its characteristics as seen below. For this dataset, we are dealing with entirely categorical data. There are no numerical features thus all our data must be integer encoded (which we will do here) prior to training or classifying. The steps for processing the data are as follows:

  1. Check for NaN values and mediate...
    • Here I checked to see if there were any missing values, in the case of this dataset I found that the 'stalk-root' feature contained '?' values. Given the large number of features, I simply removed this feature as seen below.
  1. Apply stratified shuffle split to dataset to create 4 folds of 80% training and 20% testing data and save in a list
  1. Encode string value catagories of classification as integers and add to end of datset
  1. Extract respective X_train, X_test, y_train, and y_test data from the folds
    • Here I generated the final X & y train & test datasets. Each one of those datasets contains 4 folds from the stratified shuffle split

Attribute Information:

  • Target Classes:

    • 0 = E = edible
    • 1 = P = poisonous
  • Data Classes: (this data set is purely categorical)

image.png

In [60]:
import pandas as pd
import numpy as np
# plotly
import plotly
import plotly.graph_objects as go
from plotly.graph_objs import Scatter, Marker, Layout, XAxis, YAxis, Bar, Line
plotly.offline.init_notebook_mode()
# matplotlib
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
In [61]:
column_names = ['class',
                'cap-shape',
                'cap-surface',
                'cap-color',
                'bruises',
                'odor',
                'gill-attachment',
                'gill-spacing',
                'gill-size',
                'gill-color',
                'stalk-shape',
                'stalk-root',
                'stalk-surface-above-ring',
                'stalk-surface-below-ring',
                'stalk-color-above-ring',
                'stalk-color-below-ring',
                'veil-type',
                'veil-color',
                'ring-number',
                'ring-type',
                'spore-print-color',
                'population',
                'habitat']
                
mushrooms = pd.read_csv('SHROOM/agaricus-lepiota.data', header=None, names=column_names)
data = mushrooms
In [62]:
data.shape
Out[62]:
(8124, 23)
In [63]:
data.head()
Out[63]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 p x s n t p f c n k ... s w w p w o p k s u
1 e x s y t a f c b k ... s w w p w o p n n g
2 e b s w t l f c b n ... s w w p w o p n n m
3 p x y w t p f c n n ... s w w p w o p k s u
4 e x s g f n f w b k ... s w w p w o e n a g

5 rows × 23 columns

In [64]:
data.describe()
Out[64]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
count 8124 8124 8124 8124 8124 8124 8124 8124 8124 8124 ... 8124 8124 8124 8124 8124 8124 8124 8124 8124 8124
unique 2 6 4 10 2 9 2 2 2 12 ... 4 9 9 1 4 3 5 9 6 7
top e x y n f n f c b b ... s w w p w o p w v d
freq 4208 3656 3244 2284 4748 3528 7914 6812 5612 1728 ... 4936 4464 4384 8124 7924 7488 3968 2388 4040 3148

4 rows × 23 columns

In [65]:
data['stalk-root'].unique()
Out[65]:
array(['e', 'c', 'b', 'r', '?'], dtype=object)
In [66]:
mushrooms['stalk-root'].hist()
Out[66]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ada68d8e88>

We see here that the 'stalk-root' feature has a '?' classification to it. Due to this, we will simply remove the whole column as more than half the dataset has a '?' for that feature.

In [67]:
data.drop(['stalk-root'], inplace=True, axis=1)
In [68]:
data_counts = data.apply(pd.value_counts).fillna(0)
data_counts
Out[68]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
a 0.0 0.0 0.0 0.0 0.0 400.0 210.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 384.0 0.0
b 0.0 452.0 0.0 168.0 0.0 0.0 0.0 0.0 5612.0 1728.0 ... 0.0 432.0 432.0 0.0 0.0 0.0 0.0 48.0 0.0 0.0
c 0.0 4.0 0.0 44.0 0.0 192.0 0.0 6812.0 0.0 0.0 ... 0.0 36.0 36.0 0.0 0.0 0.0 0.0 0.0 340.0 0.0
d 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3148.0
e 4208.0 0.0 0.0 1500.0 0.0 0.0 0.0 0.0 0.0 96.0 ... 0.0 96.0 96.0 0.0 0.0 0.0 2776.0 0.0 0.0 0.0
f 0.0 3152.0 2320.0 0.0 4748.0 2160.0 7914.0 0.0 0.0 0.0 ... 600.0 0.0 0.0 0.0 0.0 0.0 48.0 0.0 0.0 0.0
g 0.0 0.0 4.0 1840.0 0.0 0.0 0.0 0.0 0.0 752.0 ... 0.0 576.0 576.0 0.0 0.0 0.0 0.0 0.0 0.0 2148.0
h 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 732.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1632.0 0.0 0.0
k 0.0 828.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 408.0 ... 2304.0 0.0 0.0 0.0 0.0 0.0 0.0 1872.0 0.0 0.0
l 0.0 0.0 0.0 0.0 0.0 400.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1296.0 0.0 0.0 832.0
m 0.0 0.0 0.0 0.0 0.0 36.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 292.0
n 0.0 0.0 0.0 2284.0 0.0 3528.0 0.0 0.0 2512.0 1048.0 ... 0.0 448.0 512.0 0.0 96.0 36.0 36.0 1968.0 400.0 0.0
o 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 64.0 ... 0.0 192.0 192.0 0.0 96.0 7488.0 0.0 48.0 0.0 0.0
p 3916.0 0.0 0.0 144.0 0.0 256.0 0.0 0.0 0.0 1492.0 ... 0.0 1872.0 1872.0 8124.0 0.0 0.0 3968.0 0.0 0.0 1144.0
r 0.0 0.0 0.0 16.0 0.0 0.0 0.0 0.0 0.0 24.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 72.0 0.0 0.0
s 0.0 32.0 2556.0 0.0 0.0 576.0 0.0 0.0 0.0 0.0 ... 4936.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1248.0 0.0
t 0.0 0.0 0.0 0.0 3376.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 600.0 0.0 0.0 0.0 0.0
u 0.0 0.0 0.0 16.0 0.0 0.0 0.0 0.0 0.0 492.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 48.0 0.0 368.0
v 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4040.0 0.0
w 0.0 0.0 0.0 1040.0 0.0 0.0 0.0 1312.0 0.0 1202.0 ... 0.0 4464.0 4384.0 0.0 7924.0 0.0 0.0 2388.0 0.0 192.0
x 0.0 3656.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
y 0.0 0.0 3244.0 1072.0 0.0 576.0 0.0 0.0 0.0 86.0 ... 284.0 8.0 24.0 0.0 8.0 0.0 0.0 48.0 1712.0 0.0

22 rows × 22 columns

In [69]:
data_counts_plt = data_counts.T.plot(kind='bar',stacked=True, legend=True, figsize=(14,8), fontsize=18)
data_counts_plt.legend(loc=(1.04,0))
Out[69]:
<matplotlib.legend.Legend at 0x1ada6b03d88>
In [70]:
data
Out[70]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 p x s n t p f c n k ... s w w p w o p k s u
1 e x s y t a f c b k ... s w w p w o p n n g
2 e b s w t l f c b n ... s w w p w o p n n m
3 p x y w t p f c n n ... s w w p w o p k s u
4 e x s g f n f w b k ... s w w p w o e n a g
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8119 e k s n f n a c b y ... s o o p o o p b c l
8120 e x s n f n a c b y ... s o o p n o p b v l
8121 e f s n f n a c b n ... s o o p o o p b c l
8122 p k y n f y f c n b ... k w w p w o e w v l
8123 e x s n f n a c b y ... s o o p o o p o c l

8124 rows × 22 columns

In [71]:
data.columns
Out[71]:
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
       'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
       'stalk-shape', 'stalk-surface-above-ring', 'stalk-surface-below-ring',
       'stalk-color-above-ring', 'stalk-color-below-ring', 'veil-type',
       'veil-color', 'ring-number', 'ring-type', 'spore-print-color',
       'population', 'habitat'],
      dtype='object')

Train-Test Split

To split the data into training and testing data I used a 80/20 stratified shuffle split accross 4 folds. I chose to use stratified shuffle split because the data set is fairly small ~8000 instances. For neural networks it is better to use as much data as you can (10s of thousands typically) but our data set should be fine so long as we split it correctly. Thus, using sss allows for the dataset to be evenly split accross features (the value counts of them that is) which is something we want in order to maintain a good ratio. You can see the operation performed below...

In [72]:
from sklearn.model_selection import StratifiedShuffleSplit

train_set = []
test_set = []

# get train and test split
sss = StratifiedShuffleSplit(n_splits=4, test_size=0.2, random_state=42)
for train_index, test_index in sss.split(data, data['class']):
    train_set.append(data.loc[train_index].reset_index(drop=True))
    test_set.append(data.loc[test_index].reset_index(drop=True))
In [73]:
train_set[0].shape
Out[73]:
(6499, 22)
In [74]:
test_set[0].shape
Out[74]:
(1625, 22)
In [75]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

feature_columns = [x+'_int' for x in data.columns]
feature_columns.remove('class_int')

for row in range(len(train_set)):
    encoders = dict()
    for col in data.columns:     
        train_set[row][col] = train_set[row][col].str.strip()
        test_set[row][col] = test_set[row][col].str.strip()
        if col == 'class':
            tmp = LabelEncoder()
            train_set[row][col] = tmp.fit_transform(train_set[row][col])
            test_set[row][col] = tmp.transform(test_set[row][col])
        else:
            encoders[col] = LabelEncoder()
            train_set[row][col+'_int'] = encoders[col].fit_transform(train_set[row][col])
            test_set[row][col+'_int'] = encoders[col].fit_transform(test_set[row][col])
In [76]:
train_set[0].head()
Out[76]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring_int stalk-color-above-ring_int stalk-color-below-ring_int veil-type_int veil-color_int ring-number_int ring-type_int spore-print-color_int population_int habitat_int
0 1 f y y f f f c b g ... 1 6 0 0 2 1 2 1 5 1
1 0 x s p t n f c b e ... 2 2 7 0 2 2 0 7 1 6
2 0 b s g f n f w b w ... 2 7 7 0 2 2 4 7 3 1
3 1 f s n f s f c n b ... 1 6 7 0 2 1 0 7 4 0
4 1 k y n f f f c n b ... 1 6 6 0 2 1 0 7 4 4

5 rows × 43 columns

In [77]:
test_set[0].head()
Out[77]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring_int stalk-color-above-ring_int stalk-color-below-ring_int veil-type_int veil-color_int ring-number_int ring-type_int spore-print-color_int population_int habitat_int
0 1 x y g f f f c b p ... 1 6 4 0 2 1 2 1 5 0
1 1 x s w f c f c n u ... 2 7 7 0 2 1 4 3 3 0
2 0 x f w f n f w b p ... 2 7 7 0 2 1 0 3 3 1
3 1 f s e f f f c n b ... 2 7 7 0 2 1 0 7 4 4
4 1 x f y f f f c b g ... 1 6 4 0 2 1 2 1 5 1

5 rows × 43 columns

In [78]:
# this is where we get out final train and test datasets
y_train = []
X_train = []
y_test = []
X_test = []

for row in range(len(train_set)):
    y_train.append(train_set[row]['class'].values.astype(np.int))
    y_test.append(test_set[row]['class'].values.astype(np.int))
    
    X_train.append(train_set[row].drop(['class'], axis=1))
    X_test.append(test_set[row].drop(['class'], axis=1))

Modeling

Wide and Deep Networks

In [79]:
import keras

keras.__version__
Out[79]:
'2.3.1'
In [80]:
# keras models and layers
from keras.models import Model, Sequential
from keras.layers import Dense, Activation, Input, Dropout, Embedding, Flatten, concatenate
# sklearn encoder, recall, confusion matrix
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_val_score
from sklearn.metrics import recall_score, confusion_matrix, accuracy_score, roc_curve, auc
# graph visualization
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
In [81]:
# define function that prints out the stats
def printStats(history_list,arch):
    acc = []
    loss = []
    val_acc = []
    val_loss = []

    for hist in history_list:
        acc.append(hist.history['accuracy'])
        loss.append(hist.history['loss'])
        val_acc.append(hist.history['val_accuracy'])
        val_loss.append(hist.history['val_loss'])
    
    acc_avg_list = np.mean(acc, axis=0)
    loss_avg_list = np.mean(loss, axis=0)
    val_acc_avg_list = np.mean(val_acc, axis=0)
    val_loss_avg_list = np.mean(val_loss, axis=0)
    
    vals_list = {
        'acc': acc_avg_list,
        'loss': loss_avg_list,
        'val_acc':val_acc_avg_list,
        'val_loss': val_loss_avg_list
    }
    
    acc_avg = np.average(acc)
    loss_avg = np.average(loss)
    val_acc_avg = np.average(val_acc)
    val_loss_avg = np.average(val_loss)
    
    vals = {
        'acc': acc_avg,
        'loss': loss_avg,
        'val_acc':val_acc_avg,
        'val_loss': val_loss_avg
    }

    print("ARCHITECTURE {} - ".format(arch))
    print("Average accuracy: {}".format(acc_avg))
    print("Average loss: {}".format(loss_avg))
    print("Average validation accuracy: {}".format(val_acc_avg))
    print("Average validation loss: {}".format(val_loss_avg))
    
    return vals,vals_list

    
def printStats_legacy(history_list,arch):
    acc = []
    loss = []
    val_acc = []
    val_loss = []

    for hist in history_list:
        acc.append(hist.history['acc'])
        loss.append(hist.history['loss'])
        val_acc.append(hist.history['val_acc'])
        val_loss.append(hist.history['val_loss'])

    acc_avg = np.average(acc)
    loss_avg = np.average(loss)
    val_acc_avg = np.average(val_acc)
    val_loss_avg = np.average(val_loss)

    print("ARCHITECTURE {} - ".format(arch))
    print("Average accuracy: {}".format(acc_avg))
    print("Average loss: {}".format(loss_avg))
    print("Average validation accuracy: {}".format(val_acc_avg))
    print("Average validation loss: {}".format(val_loss_avg))
    
    return acc_avg,loss_avg,val_acc_avg,val_loss_avg

Architecture 1

Architecture 1 Explanation

For my first architecture I chose to use several crossed columns for the wide and a few deep emebedings.

For the wide portion of the network I decided to 6 columns to have a total of 3 crossed columns to be used. As seen below, I chose to cross features that would be easy for a user to distinguish and also features that seemed like they could go together. For example, I chose to cross the stalk color below and above the ring as these could be very similar depending on the mushroom. Same goes for the gill color. Odor and habitat i figured would be interesting to cross as well as there are smells we typically recognize when in a particular biome (i.e swamp).

For the deep portion of the network, I decided to use dropout in between the 3 layers to prevent overfitting of the data. I chose to use 20% and 40% for the 1st and 2nd dropout respectively. This means that in the first layer 20% of the neurons will not be updated, and in the second layer 40% will not be updated. I also used a layer size that shrunk as it got deeper. I started with 128 neurons in the first layer, 64 in the second, and 16 in the third.

In [82]:
# specify crossed columns
cross_columns = [['stalk-color-below-ring','stalk-color-above-ring'],
                 ['gill-color','veil-color'],
                 ['odor','habitat']]

# global model variables
X_train_master = []
X_test_master = []
all_inputs = []

# wide branch variables
all_wide_branch_outputs = []
N_list = []
run_once = True

# obtain integer encoded crossed embedings and data
for i in range(len(X_train)):
    X_ints_train = []
    X_ints_test = []
    
    for cols in cross_columns:
        enc = LabelEncoder()

        X_crossed_train = X_train[i][cols].apply(lambda x: '_'.join(x), axis=1)
        X_crossed_test = X_test[i][cols].apply(lambda x: '_'.join(x), axis=1)
        
        enc.fit(np.hstack((X_crossed_train.values,  X_crossed_test.values)))
        X_crossed_train = enc.transform(X_crossed_train)
        X_crossed_test = enc.transform(X_crossed_test)
        
        X_ints_train.append( X_crossed_train )
        X_ints_test.append( X_crossed_test )
        
        if(run_once):
            N_list.append(max(X_ints_train[-1]+1))
        
        
    for col in feature_columns:
        X_ints_train.append(X_train[i][col].values )
        X_ints_test.append(X_test[i][col].values )
        
    X_train_master.append(X_ints_train)
    X_test_master.append(X_ints_test)
    run_once = False
    
# create the wide branches
for i,n in enumerate(N_list):
    inputs = Input(shape=(1,),dtype='int32', name = '_'.join(cross_columns[i]))
    all_inputs.append(inputs)
    x = Embedding(input_dim=n, 
                  output_dim=int(np.sqrt(n)), 
                  input_length=1, name = '_'.join(cross_columns[i])+'_embed')(inputs)
    x = Flatten()(x)
    all_wide_branch_outputs.append(x)
    
# merge the wide branches together
wide_branch = concatenate(all_wide_branch_outputs, name='wide_concat')
wide_branch = Dense(units=1,activation='sigmoid',name='wide_combined')(wide_branch)

# deep branch variables 
all_deep_branch_outputs = []

# create the deep embedings
for col in feature_columns:
    N = max(X_train[0][col]+1)
    
    inputs = Input(shape=(1,),dtype='int32', name=col)
    all_inputs.append(inputs)
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(inputs)
    x = Flatten()(x)
    all_deep_branch_outputs.append(x)
    
# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs)
deep_branch = Dense(units=128,activation='relu')(deep_branch)
deep_branch = Dropout(0.2, seed=32)(deep_branch)
deep_branch = Dense(units=64,activation='relu')(deep_branch)
deep_branch = Dropout(0.4, seed=32)(deep_branch)
deep_branch = Dense(units=16,activation='relu')(deep_branch)

# merge wide and deep together
final_branch = concatenate([wide_branch, deep_branch],name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',name='combined')(final_branch)

# gen model
model1 = Model(inputs=all_inputs, outputs=final_branch)
In [83]:
# you will need to install pydot properly on your machine to get this running
SVG(model_to_dot(model1).create(prog='dot', format='svg'))
Out[83]:
G 1844705625032 cap-shape_int: InputLayer 1844705626824 cap-shape_int_embed: Embedding 1844705625032->1844705626824 1844705635976 cap-surface_int: InputLayer 1844698431752 cap-surface_int_embed: Embedding 1844705635976->1844698431752 1844698423432 cap-color_int: InputLayer 1844698385032 cap-color_int_embed: Embedding 1844698423432->1844698385032 1844705914568 bruises_int: InputLayer 1844705914184 bruises_int_embed: Embedding 1844705914568->1844705914184 1844706026440 odor_int: InputLayer 1844706070344 odor_int_embed: Embedding 1844706026440->1844706070344 1844706223816 gill-attachment_int: InputLayer 1844698431688 gill-attachment_int_embed: Embedding 1844706223816->1844698431688 1845373829768 gill-spacing_int: InputLayer 1845373878216 gill-spacing_int_embed: Embedding 1845373829768->1845373878216 1845374001864 gill-size_int: InputLayer 1845374049224 gill-size_int_embed: Embedding 1845374001864->1845374049224 1845375814728 gill-color_int: InputLayer 1845375856456 gill-color_int_embed: Embedding 1845375814728->1845375856456 1845375980296 stalk-shape_int: InputLayer 1845376024200 stalk-shape_int_embed: Embedding 1845375980296->1845376024200 1845376147528 stalk-surface-above-ring_int: InputLayer 1845376195208 stalk-surface-above-ring_int_embed: Embedding 1845376147528->1845376195208 1845376324168 stalk-surface-below-ring_int: InputLayer 1845376367496 stalk-surface-below-ring_int_embed: Embedding 1845376324168->1845376367496 1845376496264 stalk-color-above-ring_int: InputLayer 1845376531336 stalk-color-above-ring_int_embed: Embedding 1845376496264->1845376531336 1845376652232 stalk-color-below-ring_int: InputLayer 1845376698120 stalk-color-below-ring_int_embed: Embedding 1845376652232->1845376698120 1845376831944 veil-type_int: InputLayer 1845376871496 veil-type_int_embed: Embedding 1845376831944->1845376871496 1845376999816 veil-color_int: InputLayer 1845377040328 veil-color_int_embed: Embedding 1845376999816->1845377040328 1845377163912 ring-number_int: InputLayer 1845377203080 ring-number_int_embed: Embedding 1845377163912->1845377203080 1845377339976 ring-type_int: InputLayer 1845377383240 ring-type_int_embed: Embedding 1845377339976->1845377383240 1845377503816 spore-print-color_int: InputLayer 1845377543496 spore-print-color_int_embed: Embedding 1845377503816->1845377543496 1845377667720 population_int: InputLayer 1845377710984 population_int_embed: Embedding 1845377667720->1845377710984 1845377831560 habitat_int: InputLayer 1845377887112 habitat_int_embed: Embedding 1845377831560->1845377887112 1844705658696 flatten_75: Flatten 1844705626824->1844705658696 1844698604104 flatten_76: Flatten 1844698431752->1844698604104 1844698443144 flatten_77: Flatten 1844698385032->1844698443144 1844705967496 flatten_78: Flatten 1844705914184->1844705967496 1844706127944 flatten_79: Flatten 1844706070344->1844706127944 1844706282952 flatten_80: Flatten 1844698431688->1844706282952 1845373939336 flatten_81: Flatten 1845373878216->1845373939336 1845375757704 flatten_82: Flatten 1845374049224->1845375757704 1845375917768 flatten_83: Flatten 1845375856456->1845375917768 1845376085320 flatten_84: Flatten 1845376024200->1845376085320 1845376253320 flatten_85: Flatten 1845376195208->1845376253320 1845376421448 flatten_86: Flatten 1845376367496->1845376421448 1845376597576 flatten_87: Flatten 1845376531336->1845376597576 1845376760968 flatten_88: Flatten 1845376698120->1845376760968 1845376929352 flatten_89: Flatten 1845376871496->1845376929352 1845377105480 flatten_90: Flatten 1845377040328->1845377105480 1845377269256 flatten_91: Flatten 1845377203080->1845377269256 1845377445384 flatten_92: Flatten 1845377383240->1845377445384 1845377613384 flatten_93: Flatten 1845377543496->1845377613384 1845377777224 flatten_94: Flatten 1845377710984->1845377777224 1845377941064 flatten_95: Flatten 1845377887112->1845377941064 1844705665416 concatenate_4: Concatenate 1844705658696->1844705665416 1844698604104->1844705665416 1844698443144->1844705665416 1844705967496->1844705665416 1844706127944->1844705665416 1844706282952->1844705665416 1845373939336->1844705665416 1845375757704->1844705665416 1845375917768->1844705665416 1845376085320->1844705665416 1845376253320->1844705665416 1845376421448->1844705665416 1845376597576->1844705665416 1845376760968->1844705665416 1845376929352->1844705665416 1845377105480->1844705665416 1845377269256->1844705665416 1845377445384->1844705665416 1845377613384->1844705665416 1845377777224->1844705665416 1845377941064->1844705665416 1844705595272 dense_8: Dense 1844705665416->1844705595272 1844698535048 stalk-color-below-ring_stalk-color-above-ring: InputLayer 1845334087624 stalk-color-below-ring_stalk-color-above-ring_embed: Embedding 1844698535048->1845334087624 1845365525512 gill-color_veil-color: InputLayer 1844705737160 gill-color_veil-color_embed: Embedding 1845365525512->1844705737160 1844698464200 odor_habitat: InputLayer 1844698407752 odor_habitat_embed: Embedding 1844698464200->1844698407752 1844698563528 dropout_7: Dropout 1844705595272->1844698563528 1844698559048 flatten_72: Flatten 1845334087624->1844698559048 1845371246152 flatten_73: Flatten 1844705737160->1845371246152 1845337092744 flatten_74: Flatten 1844698407752->1845337092744 1845378270408 dense_9: Dense 1844698563528->1845378270408 1845370771848 wide_concat: Concatenate 1844698559048->1845370771848 1845371246152->1845370771848 1845337092744->1845370771848 1845378270024 dropout_8: Dropout 1845378270408->1845378270024 1845370770312 wide_combined: Dense 1845370771848->1845370770312 1845378286984 dense_10: Dense 1845378270024->1845378286984 1845378286024 concat_deep_wide: Concatenate 1845370770312->1845378286024 1845378286984->1845378286024 1845378672584 combined: Dense 1845378286024->1845378672584
In [84]:
%%time

# compile our model
model1.compile(optimizer='adagrad',
              loss='mean_squared_error',
              metrics=['accuracy'])

# save model (4 folds)
model1.save_weights('model_1_weights.h5')

recall_list = []
conf_matrix_list = []
pred_list = []

history_list = []
# loop through folds and fit the data on each fold
for i in range(len(X_train_master)):
    # reset the model
    model1.load_weights('model_1_weights.h5')
    history_list.append(model1.fit(  X_train_master[i],
                                    y_train[i], 
                                    epochs=5, 
                                    batch_size=32, 
                                    verbose=1, 
                                    validation_data = (X_test_master[i], y_test[i])
    ))
    

    pred_list.append(np.round(model1.predict(X_test_master[i])))
    conf_matrix_list.append(confusion_matrix(y_test[i],pred_list[i]))
    recall_list.append(recall_score(y_test[i],pred_list[i]))
    print("-- CONFUSION MATRIX\n {} --".format(conf_matrix_list[i]))
    print("-- RECALL {} --".format(recall_list[i]))
C:\Users\liaml\Anaconda3\envs\mlenv\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning:

Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.

Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 2s 346us/step - loss: 0.0191 - accuracy: 0.9800 - val_loss: 0.0025 - val_accuracy: 0.9969
Epoch 2/5
6499/6499 [==============================] - 1s 144us/step - loss: 0.0014 - accuracy: 0.9989 - val_loss: 0.0019 - val_accuracy: 0.9982
Epoch 3/5
6499/6499 [==============================] - 1s 143us/step - loss: 9.5644e-04 - accuracy: 0.9992 - val_loss: 0.0018 - val_accuracy: 0.9982
Epoch 4/5
6499/6499 [==============================] - 1s 143us/step - loss: 8.7147e-04 - accuracy: 0.9992 - val_loss: 0.0018 - val_accuracy: 0.9982
Epoch 5/5
6499/6499 [==============================] - 1s 142us/step - loss: 8.1136e-04 - accuracy: 0.9992 - val_loss: 0.0018 - val_accuracy: 0.9982
-- CONFUSION MATRIX
 [[842   0]
 [  3 780]] --
-- RECALL 0.9961685823754789 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 1s 142us/step - loss: 0.0574 - accuracy: 0.9254 - val_loss: 0.0081 - val_accuracy: 0.9914
Epoch 2/5
6499/6499 [==============================] - 1s 144us/step - loss: 0.0052 - accuracy: 0.9952 - val_loss: 7.3843e-04 - val_accuracy: 1.0000
Epoch 3/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0023 - accuracy: 0.9983 - val_loss: 3.0503e-04 - val_accuracy: 1.0000
Epoch 4/5
6499/6499 [==============================] - 1s 144us/step - loss: 0.0019 - accuracy: 0.9980 - val_loss: 8.4528e-05 - val_accuracy: 1.0000
Epoch 5/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0015 - accuracy: 0.9988 - val_loss: 4.0471e-05 - val_accuracy: 1.0000
-- CONFUSION MATRIX
 [[842   0]
 [  0 783]] --
-- RECALL 1.0 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.1122 - accuracy: 0.8615 - val_loss: 0.0110 - val_accuracy: 0.9895
Epoch 2/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0110 - accuracy: 0.9889 - val_loss: 0.0057 - val_accuracy: 0.9932
Epoch 3/5
6499/6499 [==============================] - 1s 144us/step - loss: 0.0039 - accuracy: 0.9963 - val_loss: 9.9844e-04 - val_accuracy: 0.9994
Epoch 4/5
6499/6499 [==============================] - 1s 143us/step - loss: 0.0022 - accuracy: 0.9985 - val_loss: 3.7186e-04 - val_accuracy: 1.0000
Epoch 5/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0017 - accuracy: 0.9986 - val_loss: 2.2408e-04 - val_accuracy: 1.0000
-- CONFUSION MATRIX
 [[842   0]
 [  0 783]] --
-- RECALL 1.0 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 1s 146us/step - loss: 0.1666 - accuracy: 0.7653 - val_loss: 0.0207 - val_accuracy: 0.9834
Epoch 2/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0147 - accuracy: 0.9874 - val_loss: 0.0117 - val_accuracy: 0.9883
Epoch 3/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0107 - accuracy: 0.9897 - val_loss: 0.0112 - val_accuracy: 0.9883
Epoch 4/5
6499/6499 [==============================] - 1s 144us/step - loss: 0.0095 - accuracy: 0.9906 - val_loss: 0.0109 - val_accuracy: 0.9883
Epoch 5/5
6499/6499 [==============================] - 1s 145us/step - loss: 0.0083 - accuracy: 0.9908 - val_loss: 0.0047 - val_accuracy: 0.9932
-- CONFUSION MATRIX
 [[842   0]
 [ 11 772]] --
-- RECALL 0.9859514687100894 --
Wall time: 24.2 s
In [85]:
# printStats(history_list, "ARCHITECTURE 1")
arch1_vals_list,arch1_mean_list = printStats(history_list, "ARCHITECTURE 1")
ARCHITECTURE ARCHITECTURE 1 - 
Average accuracy: 0.9729958772659302
Average loss: 0.02161291712557214
Average validation accuracy: 0.9952307790517807
Average validation loss: 0.004835556695265236
In [86]:
arch1_recall = np.average(recall_list)
print("AVG Recall: {}".format(arch1_recall))
AVG Recall: 0.9955300127713921
In [87]:
plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(arch1_mean_list['acc'])

plt.ylabel('Accuracy %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(arch1_mean_list['val_acc'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(arch1_mean_list['loss'])
plt.ylabel('Cross Entropy Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(arch1_mean_list['val_loss'])
plt.xlabel('epochs')
Out[87]:
Text(0.5, 0, 'epochs')
In [88]:
# obtain ROC values for architecture 1
y_pred1 = model1.predict(X_test_master[0])
#false positve and true postive rates using roc
fpr_1, tpr_1, thresholds_1 = roc_curve(y_test[0], y_pred1)
#area under the curve
auc_1 = auc(fpr_1, tpr_1)

As seen above, the recall score was just shy of being 100% on average, however, in 2 of the folds we actually did see a score of 100%. This is great to see as (ideally) we really don't want to have any false negatives in our results as it could lead to a potentially fatal outcome.

One other thing that can be seen is that there may be some over-training going on here. In the next architecture I will attempt to fix that by changing up the model.

Architecture 2

Architecture 2 Explanation

In my second architecture I chose to use a similar structure to that of my first architecture with a few changes to try an prevent overfitting of the data.

The wide portion of this architecture should function exactly as the architecture 1. I decided to keep the same cross embedings as they seemed to make sense and work well.

For the deep portion of the network I use 2 deep layers and 2 dropout layers. The first deep layer contained 128 neurons which was then followed by a 40% dropout. The second deep layer contained 64 neurons which was followed by a 60% dropout.

I hope to see improvement in the average loss & validation loss scores such that they are a bit more similar in magnitude.

In [89]:
cross_columns = [['stalk-color-below-ring','stalk-color-above-ring'],
                 ['gill-color','veil-color'],
                 ['odor','habitat']]

# global model variables
X_train_master = []
X_test_master = []
all_inputs = []

# wide branch variables
all_wide_branch_outputs = []
N_list = []
run_once = True

# obtain integer encoded crossed embedings and data
for i in range(len(X_train)):
    X_ints_train = []
    X_ints_test = []
    
    for cols in cross_columns:
        enc = LabelEncoder()

        X_crossed_train = X_train[i][cols].apply(lambda x: '_'.join(x), axis=1)
        X_crossed_test = X_test[i][cols].apply(lambda x: '_'.join(x), axis=1)
        
        enc.fit(np.hstack((X_crossed_train.values,  X_crossed_test.values)))
        X_crossed_train = enc.transform(X_crossed_train)
        X_crossed_test = enc.transform(X_crossed_test)
        
        X_ints_train.append( X_crossed_train )
        X_ints_test.append( X_crossed_test )
        
        if(run_once):
            N_list.append(max(X_ints_train[-1]+1))
        
        
    for col in feature_columns:
        X_ints_train.append(X_train[i][col].values )
        X_ints_test.append(X_test[i][col].values )
        
    X_train_master.append(X_ints_train)
    X_test_master.append(X_ints_test)
    run_once = False
    
# create the wide branches
for i,n in enumerate(N_list):
    # create embedding branch from the number of categories cross_columns[i]
    inputs = Input(shape=(1,),dtype='int32', name = '_'.join(cross_columns[i]))
    all_inputs.append(inputs)
    x = Embedding(input_dim=n, 
                  output_dim=int(np.sqrt(n)), 
                  input_length=1, name = '_'.join(cross_columns[i])+'_embed')(inputs)
    x = Flatten()(x)
    all_wide_branch_outputs.append(x)
    
# merge the wide branches together
wide_branch = concatenate(all_wide_branch_outputs, name='wide_concat')
wide_branch = Dense(units=1,activation='sigmoid',name='wide_combined')(wide_branch)

# deep branch variables 
all_deep_branch_outputs = []

# create the deep embedings
for col in feature_columns:
    # get the number of categories
    N = max(X_train[0][col]+1) # same as the max(df_train[col])
    
    # create embedding branch from the number of categories
    inputs = Input(shape=(1,),dtype='int32', name=col)
    all_inputs.append(inputs)
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(inputs)
    x = Flatten()(x)
    all_deep_branch_outputs.append(x)
    
# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs)
deep_branch = Dense(units=128,activation='relu')(deep_branch)
deep_branch = Dropout(0.4, seed=32)(deep_branch)
deep_branch = Dense(units=64,activation='relu')(deep_branch)
deep_branch = Dropout(0.6, seed=32)(deep_branch)

# merge wide and deep together
final_branch = concatenate([wide_branch, deep_branch],name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',name='combined')(final_branch)

# gen model
model2 = Model(inputs=all_inputs, outputs=final_branch)
WARNING:tensorflow:Large dropout rate: 0.6 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
In [90]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

# you will need to install pydot properly on your machine to get this running
SVG(model_to_dot(model2).create(prog='dot', format='svg'))
Out[90]:
G 1844874477512 cap-shape_int: InputLayer 1844874477320 cap-shape_int_embed: Embedding 1844874477512->1844874477320 1844874483976 cap-surface_int: InputLayer 1844874434440 cap-surface_int_embed: Embedding 1844874483976->1844874434440 1844874339784 cap-color_int: InputLayer 1844899846344 cap-color_int_embed: Embedding 1844874339784->1844899846344 1844899899464 bruises_int: InputLayer 1844899430024 bruises_int_embed: Embedding 1844899899464->1844899430024 1844899358664 odor_int: InputLayer 1844898961480 odor_int_embed: Embedding 1844899358664->1844898961480 1844898990024 gill-attachment_int: InputLayer 1844874434760 gill-attachment_int_embed: Embedding 1844898990024->1844874434760 1844867081800 gill-spacing_int: InputLayer 1844867047112 gill-spacing_int_embed: Embedding 1844867081800->1844867047112 1844866971912 gill-size_int: InputLayer 1844911124168 gill-size_int_embed: Embedding 1844866971912->1844911124168 1844911109384 gill-color_int: InputLayer 1844867640968 gill-color_int_embed: Embedding 1844911109384->1844867640968 1844867552648 stalk-shape_int: InputLayer 1844867448264 stalk-shape_int_embed: Embedding 1844867552648->1844867448264 1844877370376 stalk-surface-above-ring_int: InputLayer 1844877233736 stalk-surface-above-ring_int_embed: Embedding 1844877370376->1844877233736 1844896691016 stalk-surface-below-ring_int: InputLayer 1844896738760 stalk-surface-below-ring_int_embed: Embedding 1844896691016->1844896738760 1844897520456 stalk-color-above-ring_int: InputLayer 1844897569992 stalk-color-above-ring_int_embed: Embedding 1844897520456->1844897569992 1844897454792 stalk-color-below-ring_int: InputLayer 1844898286664 stalk-color-below-ring_int_embed: Embedding 1844897454792->1844898286664 1844898277960 veil-type_int: InputLayer 1844898188232 veil-type_int_embed: Embedding 1844898277960->1844898188232 1844897892552 veil-color_int: InputLayer 1844897857864 veil-color_int_embed: Embedding 1844897892552->1844897857864 1844867378568 ring-number_int: InputLayer 1844867415880 ring-number_int_embed: Embedding 1844867378568->1844867415880 1844867250120 ring-type_int: InputLayer 1844867180680 ring-type_int_embed: Embedding 1844867250120->1844867180680 1844873410440 spore-print-color_int: InputLayer 1844873374408 spore-print-color_int_embed: Embedding 1844873410440->1844873374408 1844876076808 population_int: InputLayer 1844875935304 population_int_embed: Embedding 1844876076808->1844875935304 1844896438792 habitat_int: InputLayer 1844896578120 habitat_int_embed: Embedding 1844896438792->1844896578120 1844874307208 flatten_99: Flatten 1844874477320->1844874307208 1844899839880 flatten_100: Flatten 1844874434440->1844899839880 1844899471176 flatten_101: Flatten 1844899846344->1844899471176 1844899297544 flatten_102: Flatten 1844899430024->1844899297544 1844899081928 flatten_103: Flatten 1844898961480->1844899081928 1844867096392 flatten_104: Flatten 1844874434760->1844867096392 1844911275592 flatten_105: Flatten 1844867047112->1844911275592 1844911026440 flatten_106: Flatten 1844911124168->1844911026440 1844867535048 flatten_107: Flatten 1844867640968->1844867535048 1844877316232 flatten_108: Flatten 1844867448264->1844877316232 1844896748296 flatten_109: Flatten 1844877233736->1844896748296 1844896617672 flatten_110: Flatten 1844896738760->1844896617672 1844897424456 flatten_111: Flatten 1844897569992->1844897424456 1844898375752 flatten_112: Flatten 1844898286664->1844898375752 1844897892360 flatten_113: Flatten 1844898188232->1844897892360 1844867375496 flatten_114: Flatten 1844897857864->1844867375496 1844867248264 flatten_115: Flatten 1844867415880->1844867248264 1844873408584 flatten_116: Flatten 1844867180680->1844873408584 1844876075080 flatten_117: Flatten 1844873374408->1844876075080 1844875938056 flatten_118: Flatten 1844875935304->1844875938056 1844896557768 flatten_119: Flatten 1844896578120->1844896557768 1844898746568 concatenate_5: Concatenate 1844874307208->1844898746568 1844899839880->1844898746568 1844899471176->1844898746568 1844899297544->1844898746568 1844899081928->1844898746568 1844867096392->1844898746568 1844911275592->1844898746568 1844911026440->1844898746568 1844867535048->1844898746568 1844877316232->1844898746568 1844896748296->1844898746568 1844896617672->1844898746568 1844897424456->1844898746568 1844898375752->1844898746568 1844897892360->1844898746568 1844867375496->1844898746568 1844867248264->1844898746568 1844873408584->1844898746568 1844876075080->1844898746568 1844875938056->1844898746568 1844896557768->1844898746568 1844873592776 stalk-color-below-ring_stalk-color-above-ring: InputLayer 1844898797192 stalk-color-below-ring_stalk-color-above-ring_embed: Embedding 1844873592776->1844898797192 1844873589832 gill-color_veil-color: InputLayer 1844873591240 gill-color_veil-color_embed: Embedding 1844873589832->1844873591240 1844873631560 odor_habitat: InputLayer 1844873595400 odor_habitat_embed: Embedding 1844873631560->1844873595400 1844898898440 dense_11: Dense 1844898746568->1844898898440 1844873589192 flatten_96: Flatten 1844898797192->1844873589192 1844874425160 flatten_97: Flatten 1844873591240->1844874425160 1844898883784 flatten_98: Flatten 1844873595400->1844898883784 1844873688392 dropout_9: Dropout 1844898898440->1844873688392 1845338114184 wide_concat: Concatenate 1844873589192->1845338114184 1844874425160->1845338114184 1844898883784->1845338114184 1844901189384 dense_12: Dense 1844873688392->1844901189384 1845336115144 wide_combined: Dense 1845338114184->1845336115144 1844901189000 dropout_10: Dropout 1844901189384->1844901189000 1844897087048 concat_deep_wide: Concatenate 1845336115144->1844897087048 1844901189000->1844897087048 1844897085320 combined: Dense 1844897087048->1844897085320
In [91]:
%%time

# compile our model
model2.compile(optimizer='adagrad',
              loss='mean_squared_error',
              metrics=['accuracy'])

# save model (4 folds)
model2.save_weights('model_2_weights.h5')

recall_list_2 = []
conf_matrix_list_2 = []
pred_list_2 = []

history_list_2 = []
# loop through folds and fit the data on each fold
for i in range(len(X_train_master)):
    # reset the model
    model2.load_weights('model_2_weights.h5')
    history_list_2.append(model2.fit(  X_train_master[i],
                                    y_train[i], 
                                    epochs=5, 
                                    batch_size=32, 
                                    verbose=1, 
                                    validation_data = (X_test_master[i], y_test[i])
    ))
    

    pred_list_2.append(np.round(model2.predict(X_test_master[i])))
    conf_matrix_list_2.append(confusion_matrix(y_test[i],pred_list_2[i]))
    recall_list_2.append(recall_score(y_test[i],pred_list_2[i]))
    print("-- CONFUSION MATRIX\n {} --".format(conf_matrix_list_2[i]))
    print("-- RECALL {} --".format(recall_list_2[i]))
C:\Users\liaml\Anaconda3\envs\mlenv\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning:

Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.

Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 2s 334us/step - loss: 0.0213 - accuracy: 0.9786 - val_loss: 0.0034 - val_accuracy: 0.9963
Epoch 2/5
6499/6499 [==============================] - 1s 129us/step - loss: 0.0022 - accuracy: 0.9988 - val_loss: 0.0022 - val_accuracy: 0.9982
Epoch 3/5
6499/6499 [==============================] - 1s 131us/step - loss: 0.0016 - accuracy: 0.9986 - val_loss: 0.0019 - val_accuracy: 0.9982
Epoch 4/5
6499/6499 [==============================] - 1s 129us/step - loss: 0.0011 - accuracy: 0.9992 - val_loss: 0.0018 - val_accuracy: 0.9982
Epoch 5/5
6499/6499 [==============================] - 1s 129us/step - loss: 9.7042e-04 - accuracy: 0.9992 - val_loss: 0.0018 - val_accuracy: 0.9982
-- CONFUSION MATRIX
 [[842   0]
 [  3 780]] --
-- RECALL 0.9961685823754789 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 1s 128us/step - loss: 0.0854 - accuracy: 0.9020 - val_loss: 0.0099 - val_accuracy: 0.9840
Epoch 2/5
6499/6499 [==============================] - 1s 128us/step - loss: 0.0084 - accuracy: 0.9922 - val_loss: 0.0025 - val_accuracy: 0.9994
Epoch 3/5
6499/6499 [==============================] - 1s 128us/step - loss: 0.0039 - accuracy: 0.9974 - val_loss: 7.3759e-04 - val_accuracy: 1.0000
Epoch 4/5
6499/6499 [==============================] - 1s 131us/step - loss: 0.0026 - accuracy: 0.9978 - val_loss: 3.9850e-04 - val_accuracy: 1.0000
Epoch 5/5
6499/6499 [==============================] - 1s 130us/step - loss: 0.0018 - accuracy: 0.9985 - val_loss: 2.8789e-04 - val_accuracy: 1.0000
-- CONFUSION MATRIX
 [[842   0]
 [  0 783]] --
-- RECALL 1.0 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 1s 129us/step - loss: 0.1457 - accuracy: 0.8320 - val_loss: 0.0341 - val_accuracy: 0.9742
Epoch 2/5
6499/6499 [==============================] - 1s 130us/step - loss: 0.0205 - accuracy: 0.9817 - val_loss: 0.0057 - val_accuracy: 0.9914
Epoch 3/5
6499/6499 [==============================] - 1s 128us/step - loss: 0.0074 - accuracy: 0.9940 - val_loss: 0.0021 - val_accuracy: 0.9994
Epoch 4/5
6499/6499 [==============================] - 1s 129us/step - loss: 0.0046 - accuracy: 0.9968 - val_loss: 9.2390e-04 - val_accuracy: 1.0000
Epoch 5/5
6499/6499 [==============================] - 1s 130us/step - loss: 0.0029 - accuracy: 0.9982 - val_loss: 5.7960e-04 - val_accuracy: 1.0000
-- CONFUSION MATRIX
 [[842   0]
 [  0 783]] --
-- RECALL 1.0 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/5
6499/6499 [==============================] - 1s 130us/step - loss: 0.1870 - accuracy: 0.7807 - val_loss: 0.0765 - val_accuracy: 0.9163
Epoch 2/5
6499/6499 [==============================] - 1s 130us/step - loss: 0.0463 - accuracy: 0.9521 - val_loss: 0.0162 - val_accuracy: 0.9828
Epoch 3/5
6499/6499 [==============================] - 1s 131us/step - loss: 0.0134 - accuracy: 0.9877 - val_loss: 0.0072 - val_accuracy: 0.9914
Epoch 4/5
6499/6499 [==============================] - 1s 130us/step - loss: 0.0070 - accuracy: 0.9938 - val_loss: 0.0039 - val_accuracy: 0.9975
Epoch 5/5
6499/6499 [==============================] - 1s 128us/step - loss: 0.0041 - accuracy: 0.9977 - val_loss: 0.0028 - val_accuracy: 0.9982
-- CONFUSION MATRIX
 [[842   0]
 [  3 780]] --
-- RECALL 0.9961685823754789 --
Wall time: 22.4 s
In [92]:
# printStats_v2_4(history_list, "ARCHITECTURE 2")
arch2_vals_list,arch2_mean_list = printStats(history_list_2, "ARCHITECTURE 2")
ARCHITECTURE ARCHITECTURE 2 - 
Average accuracy: 0.9688490033149719
Average loss: 0.028404633855912235
Average validation accuracy: 0.9911692380905152
Average validation loss: 0.008744909486035613
In [93]:
arch2_recall = np.average(recall_list_2)
print("AVG Recall: {}".format(arch2_recall))
AVG Recall: 0.9980842911877394
In [94]:
plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(arch2_mean_list['acc'])

plt.ylabel('Accuracy %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(arch2_mean_list['val_acc'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(arch2_mean_list['loss'])
plt.ylabel('Cross Entropy Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(arch2_mean_list['val_loss'])
plt.xlabel('epochs')
Out[94]:
Text(0.5, 0, 'epochs')
In [95]:
# obtain ROC values for architecture 1
y_pred2 = model2.predict(X_test_master[0])
#false positve and true postive rates using roc
fpr_2, tpr_2, thresholds_2 = roc_curve(y_test[0], y_pred2)
#area under the curve
auc_2 = auc(fpr_2, tpr_2)

In this model, the recall score was just shy of being 100% on average but was improved from the previous architecture by 0.3%. The change is not that much but it is still better than before and we seriously want to ensure that the score is as close to 100% as possible.

Unfortunately our losses compared the the first model are pretty much exactly the same in terms of ratio. This is unfortunate as it could signify there is still over-training occuring.

Architecture 3

Architecture Explanation

In my third architecture I plan to use a similar architecture as architecture 2 but with a changeup in the crossed-features.

For the wide portion of my network I will now be crossing population-habitat-odor and bruises-habitat. The reason I chose to cross these like this is because there could be a strong relation for more easily identifyable characteristics that could improve classification score. Some of the characteristics in the data set are quite difficult to classify like 'gill-spacing' or any of the characteristics that require touch. If someone were to be strolling through the forest in search of a tasty mushroom, they could more easily generalize an area of mushrooms by these features without getting really close (think of an app and you see a bunch of mushrooms). I also crossed bruises and habitat to see if there is a strong correlation there as well.

For the deep portion of the network, I brought down the first dropout rate from 40% to 30%, leaving the second at 60%, and I also reduced the neurons used in the first and second networks to 100 and 50.

In [105]:
cross_columns = [['population','habitat','odor'],
                 ['bruises','habitat']]

# global model variables
X_train_master = []
X_test_master = []
all_inputs = []

# wide branch variables
all_wide_branch_outputs = []
N_list = []
run_once = True

# obtain integer encoded crossed embedings and data
for i in range(len(X_train)):
    X_ints_train = []
    X_ints_test = []
    
    for cols in cross_columns:
        enc = LabelEncoder()

        X_crossed_train = X_train[i][cols].apply(lambda x: '_'.join(x), axis=1)
        X_crossed_test = X_test[i][cols].apply(lambda x: '_'.join(x), axis=1)
        
        enc.fit(np.hstack((X_crossed_train.values,  X_crossed_test.values)))
        X_crossed_train = enc.transform(X_crossed_train)
        X_crossed_test = enc.transform(X_crossed_test)
        
        X_ints_train.append( X_crossed_train )
        X_ints_test.append( X_crossed_test )
        
        if(run_once):
            N_list.append(max(X_ints_train[-1]+1))
        
        
    for col in feature_columns:
        X_ints_train.append(X_train[i][col].values )
        X_ints_test.append(X_test[i][col].values )
        
    X_train_master.append(X_ints_train)
    X_test_master.append(X_ints_test)
    run_once = False
    
# create the wide branches
for i,n in enumerate(N_list):
    # create embedding branch from the number of categories cross_columns[i]
    inputs = Input(shape=(1,),dtype='int32', name = '_'.join(cross_columns[i]))
    all_inputs.append(inputs)
    x = Embedding(input_dim=n, 
                  output_dim=int(np.sqrt(n)), 
                  input_length=1, name = '_'.join(cross_columns[i])+'_embed')(inputs)
    x = Flatten()(x)
    all_wide_branch_outputs.append(x)
    
# merge the wide branches together
wide_branch = concatenate(all_wide_branch_outputs, name='wide_concat')
wide_branch = Dense(units=1,activation='sigmoid',name='wide_combined')(wide_branch)

# deep branch variables 
all_deep_branch_outputs = []

# create the deep embedings
for col in feature_columns:
    # get the number of categories
    N = max(X_train[0][col]+1) # same as the max(df_train[col])
    
    # create embedding branch from the number of categories
    inputs = Input(shape=(1,),dtype='int32', name=col)
    all_inputs.append(inputs)
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(inputs)
    x = Flatten()(x)
    all_deep_branch_outputs.append(x)
    
# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs)
deep_branch = Dense(units=100,activation='relu')(deep_branch)
deep_branch = Dropout(0.3, seed=32)(deep_branch)
deep_branch = Dense(units=50,activation='relu')(deep_branch)
deep_branch = Dropout(0.6, seed=32)(deep_branch)

# merge wide and deep together
final_branch = concatenate([wide_branch, deep_branch],name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',name='combined')(final_branch)

# gen model
model3 = Model(inputs=all_inputs, outputs=final_branch)
WARNING:tensorflow:Large dropout rate: 0.6 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
In [106]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

# you will need to install pydot properly on your machine to get this running
SVG(model_to_dot(model3).create(prog='dot', format='svg'))
Out[106]:
G 1845532303880 cap-shape_int: InputLayer 1845532304648 cap-shape_int_embed: Embedding 1845532303880->1845532304648 1845532553032 cap-surface_int: InputLayer 1845532408968 cap-surface_int_embed: Embedding 1845532553032->1845532408968 1845532620680 cap-color_int: InputLayer 1845532568200 cap-color_int_embed: Embedding 1845532620680->1845532568200 1845532791752 bruises_int: InputLayer 1845532847304 bruises_int_embed: Embedding 1845532791752->1845532847304 1845532921480 odor_int: InputLayer 1845532962568 odor_int_embed: Embedding 1845532921480->1845532962568 1845533128456 gill-attachment_int: InputLayer 1845532008072 gill-attachment_int_embed: Embedding 1845533128456->1845532008072 1845533263496 gill-spacing_int: InputLayer 1845532568776 gill-spacing_int_embed: Embedding 1845533263496->1845532568776 1845533438664 gill-size_int: InputLayer 1845533472968 gill-size_int_embed: Embedding 1845533438664->1845533472968 1845533607304 gill-color_int: InputLayer 1845533640520 gill-color_int_embed: Embedding 1845533607304->1845533640520 1845533771848 stalk-shape_int: InputLayer 1845533814344 stalk-shape_int_embed: Embedding 1845533771848->1845533814344 1845533943048 stalk-surface-above-ring_int: InputLayer 1845533985800 stalk-surface-above-ring_int_embed: Embedding 1845533943048->1845533985800 1845534114760 stalk-surface-below-ring_int: InputLayer 1845534153736 stalk-surface-below-ring_int_embed: Embedding 1845534114760->1845534153736 1845534282568 stalk-color-above-ring_int: InputLayer 1845534329736 stalk-color-above-ring_int_embed: Embedding 1845534282568->1845534329736 1845534498184 stalk-color-below-ring_int: InputLayer 1845534497544 stalk-color-below-ring_int_embed: Embedding 1845534498184->1845534497544 1845534626440 veil-type_int: InputLayer 1845534669448 veil-type_int_embed: Embedding 1845534626440->1845534669448 1845534798280 veil-color_int: InputLayer 1845534842440 veil-color_int_embed: Embedding 1845534798280->1845534842440 1845537259848 ring-number_int: InputLayer 1845537298824 ring-number_int_embed: Embedding 1845537259848->1845537298824 1845537427656 ring-type_int: InputLayer 1845537466632 ring-type_int_embed: Embedding 1845537427656->1845537466632 1845537595528 spore-print-color_int: InputLayer 1845537639752 spore-print-color_int_embed: Embedding 1845537595528->1845537639752 1845537708680 population_int: InputLayer 1845537806472 population_int_embed: Embedding 1845537708680->1845537806472 1845537872456 habitat_int: InputLayer 1845537995208 habitat_int_embed: Embedding 1845537872456->1845537995208 1845532210632 flatten_145: Flatten 1845532304648->1845532210632 1845532449992 flatten_146: Flatten 1845532408968->1845532449992 1845532762824 flatten_147: Flatten 1845532568200->1845532762824 1845532918216 flatten_148: Flatten 1845532847304->1845532918216 1845533099976 flatten_149: Flatten 1845532962568->1845533099976 1845533263432 flatten_150: Flatten 1845532008072->1845533263432 1845533295304 flatten_151: Flatten 1845532568776->1845533295304 1845533526728 flatten_152: Flatten 1845533472968->1845533526728 1845533707976 flatten_153: Flatten 1845533640520->1845533707976 1845533879752 flatten_154: Flatten 1845533814344->1845533879752 1845534051912 flatten_155: Flatten 1845533985800->1845534051912 1845534219720 flatten_156: Flatten 1845534153736->1845534219720 1845534395720 flatten_157: Flatten 1845534329736->1845534395720 1845534563400 flatten_158: Flatten 1845534497544->1845534563400 1845534731208 flatten_159: Flatten 1845534669448->1845534731208 1845534907656 flatten_160: Flatten 1845534842440->1845534907656 1845537364936 flatten_161: Flatten 1845537298824->1845537364936 1845537540296 flatten_162: Flatten 1845537466632->1845537540296 1845537771592 flatten_163: Flatten 1845537639752->1845537771592 1845537935432 flatten_164: Flatten 1845537806472->1845537935432 1845538115528 flatten_165: Flatten 1845537995208->1845538115528 1845532814600 concatenate_7: Concatenate 1845532210632->1845532814600 1845532449992->1845532814600 1845532762824->1845532814600 1845532918216->1845532814600 1845533099976->1845532814600 1845533263432->1845532814600 1845533295304->1845532814600 1845533526728->1845532814600 1845533707976->1845532814600 1845533879752->1845532814600 1845534051912->1845532814600 1845534219720->1845532814600 1845534395720->1845532814600 1845534563400->1845532814600 1845534731208->1845532814600 1845534907656->1845532814600 1845537364936->1845532814600 1845537540296->1845532814600 1845537771592->1845532814600 1845537935432->1845532814600 1845538115528->1845532814600 1845532705352 population_habitat_odor: InputLayer 1845297915272 population_habitat_odor_embed: Embedding 1845532705352->1845297915272 1845532707400 bruises_habitat: InputLayer 1845532706504 bruises_habitat_embed: Embedding 1845532707400->1845532706504 1845532130760 dense_15: Dense 1845532814600->1845532130760 1845532707272 flatten_143: Flatten 1845297915272->1845532707272 1845532706696 flatten_144: Flatten 1845532706504->1845532706696 1845532562696 dropout_13: Dropout 1845532130760->1845532562696 1845532705416 wide_concat: Concatenate 1845532707272->1845532705416 1845532706696->1845532705416 1845538328136 dense_16: Dense 1845532562696->1845538328136 1845531947848 wide_combined: Dense 1845532705416->1845531947848 1845538328456 dropout_14: Dropout 1845538328136->1845538328456 1845538384840 concat_deep_wide: Concatenate 1845531947848->1845538384840 1845538328456->1845538384840 1845538385416 combined: Dense 1845538384840->1845538385416
In [107]:
%%time

# compile our model
model3.compile(optimizer='adagrad',
              loss='mean_squared_error',
              metrics=['accuracy'])

# save model (4 folds)
model3.save_weights('model_3_weights.h5')

recall_list_3 = []
conf_matrix_list_3 = []
pred_list_3 = []

history_list_3 = []
# loop through folds and fit the data on each fold
for i in range(len(X_train_master)):
    # reset the model
    model3.load_weights('model_3_weights.h5')
    history_list_3.append(model3.fit(  X_train_master[i],
                                    y_train[i], 
                                    epochs=7, 
                                    batch_size=32, 
                                    verbose=1, 
                                    validation_data = (X_test_master[i], y_test[i])
    ))
    

    pred_list_3.append(np.round(model3.predict(X_test_master[i])))
    conf_matrix_list_3.append(confusion_matrix(y_test[i],pred_list_3[i]))
    recall_list_3.append(recall_score(y_test[i],pred_list_3[i]))
    print("-- CONFUSION MATRIX\n {} --".format(conf_matrix_list_3[i]))
    print("-- RECALL {} --".format(recall_list_3[i]))
C:\Users\liaml\Anaconda3\envs\mlenv\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning:

Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.

Train on 6499 samples, validate on 1625 samples
Epoch 1/7
6499/6499 [==============================] - 2s 330us/step - loss: 0.0244 - accuracy: 0.9765 - val_loss: 0.0035 - val_accuracy: 0.9963
Epoch 2/7
6499/6499 [==============================] - 1s 146us/step - loss: 0.0026 - accuracy: 0.9983 - val_loss: 0.0019 - val_accuracy: 0.9982
Epoch 3/7
6499/6499 [==============================] - 1s 142us/step - loss: 0.0015 - accuracy: 0.9988 - val_loss: 0.0019 - val_accuracy: 0.9982
Epoch 4/7
6499/6499 [==============================] - 1s 139us/step - loss: 0.0010 - accuracy: 0.9992 - val_loss: 0.0017 - val_accuracy: 0.9982
Epoch 5/7
6499/6499 [==============================] - 1s 136us/step - loss: 0.0010 - accuracy: 0.9992 - val_loss: 0.0016 - val_accuracy: 0.9982
Epoch 6/7
6499/6499 [==============================] - 1s 138us/step - loss: 9.3823e-04 - accuracy: 0.9992 - val_loss: 0.0014 - val_accuracy: 0.9982
Epoch 7/7
6499/6499 [==============================] - 1s 140us/step - loss: 6.7846e-04 - accuracy: 0.9994 - val_loss: 6.4121e-04 - val_accuracy: 0.9988
-- CONFUSION MATRIX
 [[842   0]
 [  2 781]] --
-- RECALL 0.9974457215836526 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/7
6499/6499 [==============================] - 1s 146us/step - loss: 0.0891 - accuracy: 0.9011 - val_loss: 0.0108 - val_accuracy: 0.9815
Epoch 2/7
6499/6499 [==============================] - 1s 143us/step - loss: 0.0106 - accuracy: 0.9905 - val_loss: 0.0032 - val_accuracy: 0.9994
Epoch 3/7
6499/6499 [==============================] - 1s 146us/step - loss: 0.0049 - accuracy: 0.9969 - val_loss: 0.0013 - val_accuracy: 1.0000
Epoch 4/7
6499/6499 [==============================] - 1s 143us/step - loss: 0.0030 - accuracy: 0.9982 - val_loss: 8.1664e-04 - val_accuracy: 1.0000
Epoch 5/7
6499/6499 [==============================] - 1s 140us/step - loss: 0.0022 - accuracy: 0.9986 - val_loss: 9.6176e-04 - val_accuracy: 0.9988
Epoch 6/7
6499/6499 [==============================] - 1s 144us/step - loss: 0.0020 - accuracy: 0.9983 - val_loss: 4.9102e-04 - val_accuracy: 1.0000
Epoch 7/7
6499/6499 [==============================] - 1s 150us/step - loss: 0.0013 - accuracy: 0.9995 - val_loss: 2.8623e-04 - val_accuracy: 1.0000
-- CONFUSION MATRIX
 [[842   0]
 [  0 783]] --
-- RECALL 1.0 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/7
6499/6499 [==============================] - 1s 146us/step - loss: 0.1689 - accuracy: 0.8201 - val_loss: 0.0490 - val_accuracy: 0.9237
Epoch 2/7
6499/6499 [==============================] - 1s 145us/step - loss: 0.0310 - accuracy: 0.9697 - val_loss: 0.0092 - val_accuracy: 0.9877
Epoch 3/7
6499/6499 [==============================] - 1s 139us/step - loss: 0.0116 - accuracy: 0.9895 - val_loss: 0.0041 - val_accuracy: 0.9969
Epoch 4/7
6499/6499 [==============================] - 1s 138us/step - loss: 0.0062 - accuracy: 0.9946 - val_loss: 0.0020 - val_accuracy: 1.0000
Epoch 5/7
6499/6499 [==============================] - 1s 140us/step - loss: 0.0047 - accuracy: 0.9965 - val_loss: 0.0011 - val_accuracy: 1.0000
Epoch 6/7
6499/6499 [==============================] - 1s 147us/step - loss: 0.0032 - accuracy: 0.9985 - val_loss: 7.4790e-04 - val_accuracy: 1.0000
Epoch 7/7
6499/6499 [==============================] - 1s 158us/step - loss: 0.0027 - accuracy: 0.9983 - val_loss: 4.7558e-04 - val_accuracy: 1.0000
-- CONFUSION MATRIX
 [[842   0]
 [  0 783]] --
-- RECALL 1.0 --
Train on 6499 samples, validate on 1625 samples
Epoch 1/7
6499/6499 [==============================] - 1s 150us/step - loss: 0.2122 - accuracy: 0.7733 - val_loss: 0.1076 - val_accuracy: 0.9114
Epoch 2/7
6499/6499 [==============================] - 1s 148us/step - loss: 0.0576 - accuracy: 0.9452 - val_loss: 0.0184 - val_accuracy: 0.9772
Epoch 3/7
6499/6499 [==============================] - 1s 154us/step - loss: 0.0181 - accuracy: 0.9829 - val_loss: 0.0097 - val_accuracy: 0.9858
Epoch 4/7
6499/6499 [==============================] - 1s 148us/step - loss: 0.0104 - accuracy: 0.9892 - val_loss: 0.0068 - val_accuracy: 0.9951
Epoch 5/7
6499/6499 [==============================] - 1s 140us/step - loss: 0.0073 - accuracy: 0.9942 - val_loss: 0.0032 - val_accuracy: 0.9975
Epoch 6/7
6499/6499 [==============================] - 1s 138us/step - loss: 0.0050 - accuracy: 0.9972 - val_loss: 0.0022 - val_accuracy: 0.9982
Epoch 7/7
6499/6499 [==============================] - 1s 148us/step - loss: 0.0032 - accuracy: 0.9985 - val_loss: 0.0017 - val_accuracy: 0.9982
-- CONFUSION MATRIX
 [[842   0]
 [  3 780]] --
-- RECALL 0.9961685823754789 --
Wall time: 31.9 s
In [108]:
# printStats_v2_4(history_list, "ARCHITECTURE 2")
arch3_vals_list,arch3_mean_list = printStats(history_list_3, "ARCHITECTURE 3")
ARCHITECTURE ARCHITECTURE 3 - 
Average accuracy: 0.9750510454177856
Average loss: 0.02454814493740959
Average validation accuracy: 0.9906153891767774
Average validation loss: 0.008805676601237497
In [116]:
plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(arch2_mean_list['acc'])

plt.ylabel('Accuracy %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(arch2_mean_list['val_acc'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(arch2_mean_list['loss'])
plt.ylabel('Cross Entropy Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(arch2_mean_list['val_loss'])
plt.xlabel('epochs')
Out[116]:
Text(0.5, 0, 'epochs')
In [109]:
arch3_recall = np.average(recall_list_3)
print("AVG Recall: {}".format(arch3_recall))
AVG Recall: 0.9984035759897829
In [110]:
plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(arch3_mean_list['acc'])

plt.ylabel('Accuracy %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(arch3_mean_list['val_acc'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(arch3_mean_list['loss'])
plt.ylabel('Cross Entropy Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(arch3_mean_list['val_loss'])
plt.xlabel('epochs')
Out[110]:
Text(0.5, 0, 'epochs')
In [111]:
# obtain ROC values for architecture 1
y_pred3 = model3.predict(X_test_master[0])
#false positve and true postive rates using roc
fpr_3, tpr_3, thresholds_3 = roc_curve(y_test[0], y_pred3)
#area under the curve
auc_3 = auc(fpr_3, tpr_3)

In this model there was an improvement in recall score but the loss was approximately the same. Therefore there could be some overfitting occuring in this model. That being said though, we were still able hit better percentages than the second model in earlier epochs.

This model is still the best we have seen yet.

Comparing wide and deep networks

In [112]:
print("AVG Model 1 Recall: {}".format(arch1_recall))
print("AVG Model 2 Recall: {}".format(arch2_recall))
print("AVG Model 3 Recall: {}".format(arch3_recall))
AVG Model 1 Recall: 0.9955300127713921
AVG Model 2 Recall: 0.9980842911877394
AVG Model 3 Recall: 0.9984035759897829
In [113]:
plt.figure(figsize=(12,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot model 1 ROC
plt.plot(fpr_1, tpr_1, label='Model 1 (area = {:.3f})'.format(auc_1))

#plot model 2 ROC
plt.plot(fpr_2, tpr_2, label='Model 2 (area = {:.3f})'.format(auc_2))

#plot model 3 ROC
plt.plot(fpr_3, tpr_3, label='Model 3 (area = {:.3f})'.format(auc_3))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('All Wide and Deep ROC curves')
plt.legend(loc='best')
plt.show()

In the ROC plot seen above, all three models perform extremely well. They almost have a perfect prediction for a given mushroom which is really good to see! However, we were expecting to achieve really good results with this dataset. It was very well maintained and had a lot of data to learn from, thus, it makes sense that is can accurately predict mushrooms really well. All three models also perform essentially the same, in fact, they all share the exact same AUC. However, we do see a slight difference in the evaluation criteria we chose for these models. Model 3 actually ends up becoming the best model since it is the exact same as model 2 but has a higher recall score by ~0.04%. Thus, Model 3 is our best wide and deep model.

Comparison to MLP

Now I will compare the best wide and deep network to sklearns MLP. In this case, the third model was the best as it had the best ROC and Recall score.

In [57]:
X_train_ints = []
X_test_ints = []

for i in range(len(X_train)):
    X_train_ints.append(X_train[i][feature_columns])
    X_test_ints.append(X_test[i][feature_columns])
In [58]:
from sklearn.neural_network import MLPClassifier

yhat_list = []
mlp_recall_list = []

mlp = MLPClassifier(hidden_layer_sizes=(6,),
                    learning_rate_init=0.01,
                    random_state=1,
                    activation='relu')

for i in range(len(X_train)):
    mlp.fit(X_train_ints[i], y_train[i])
    yhat_list.append(mlp.predict(X_test_ints[i]))
    mlp_recall_list.append(recall_score(y_test[i], yhat_list[i]))
    print("MLP Recall Score: ", mlp_recall_list[i])
    print("MLP Accuracy Score: ", accuracy_score(y_test[i], yhat_list[i]))
    

mlp_avg_recall = np.average(mlp_recall_list)

print("Avg MLP Recall Score: ", mlp_avg_recall)

#false positve and true postive rates using roc
# for i in range(len(yhat_list)):
fpr_mlp, tpr_mlp, thresholds_mlp = roc_curve(y_test[0], yhat_list[0])

#area under the curve
auc_mlp = auc(fpr_mlp, tpr_mlp)
MLP Recall Score:  1.0
MLP Accuracy Score:  0.9969230769230769
MLP Recall Score:  0.9118773946360154
MLP Accuracy Score:  0.9532307692307692
C:\Users\liaml\Anaconda3\envs\mlenv\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py:566: ConvergenceWarning:

Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.

MLP Recall Score:  0.9897828863346104
MLP Accuracy Score:  0.9950769230769231
MLP Recall Score:  0.9757343550446999
MLP Accuracy Score:  0.9815384615384616
Avg MLP Recall Score:  0.9693486590038315
In [114]:
plt.figure(figsize=(12,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot Wide and Deep ROC
plt.plot(fpr_3, tpr_3, label='Wide and Deep (area = {:.3f})'.format(auc_3))

#plot MLP ROC
plt.plot(fpr_mlp, tpr_mlp, label='MLP (area = {:.3f})'.format(auc_mlp))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Wide and Deep vs MLP ROC curve')
plt.legend(loc='best')
plt.show()

As seen in the graph above, the MLP and the Wide and Deep perform essentially exactly the same. This was expected as stated above, the data set we chose is truly a really well constructed data set and has a lot of categorical features to memorize. However, that being said, we still see that the wide and deep model performs better as it has a higher AUC and recall score.

Additional Analysis

Weight Viusalization

In [136]:
weights_cat_int = []
for l in model3.layers:
    if '_embed' in l.name and l.name[:-6] in feature_columns:
        print(l.name)
        weights_cat_int.append(l.get_weights())
cap-shape_int_embed
cap-surface_int_embed
cap-color_int_embed
bruises_int_embed
odor_int_embed
gill-attachment_int_embed
gill-spacing_int_embed
gill-size_int_embed
gill-color_int_embed
stalk-shape_int_embed
stalk-surface-above-ring_int_embed
stalk-surface-below-ring_int_embed
stalk-color-above-ring_int_embed
stalk-color-below-ring_int_embed
veil-type_int_embed
veil-color_int_embed
ring-number_int_embed
ring-type_int_embed
spore-print-color_int_embed
population_int_embed
habitat_int_embed
In [170]:
weights_cat_int[-1][0]
Out[170]:
array([[ 0.04649292,  0.0610834 ],
       [ 0.00086852,  0.08206345],
       [-0.00218932,  0.04758379],
       [ 0.03826057,  0.01329113],
       [ 0.0562977 ,  0.05891328],
       [-0.00835472, -0.02543542],
       [-0.04120474,  0.1280169 ]], dtype=float32)
In [172]:
import matplotlib.pyplot as plt
import numpy as np

n_categories = weights_cat_int[-1][0].shape[0]
# plot the mean abs value of each of the weights
mean_abs = [np.absolute(np.mean(weights_cat_int[-1][0][i:i+1])) for i in range(n_categories)]

fig = plt.figure(figsize=(10,5))
plt.title("Habitat: Magnitude of Mean Weights", fontsize=20)
plt.xlabel('Habitat type', fontsize=16)
plt.ylabel('Mean of Weights (Maginitude)', fontsize=16)
plt.bar(range(n_categories), mean_abs)

ticks = []
for i in range(n_categories):
    ticks.append(encoders['habitat'].inverse_transform([i])[0])

plt.xticks(range(n_categories), ticks, ha='right')
plt.show()

As seen above, the average weights are calculated per classification type for the habitat feature. We can see that above the woods biome (d) has a high magnitude as well as the path biome (p).

This tells us that mushrooms from woodsy and path-like habitats influenced the edibility of a given mushroom the most. This is good as we anticipate a lot of our users to be going on walks through forests.

In [177]:
n_categories = weights_cat_int[4][0].shape[0]
# plot the mean abs value of each of the weights
mean_abs = [np.absolute(np.mean(weights_cat_int[4][0][i:i+1])) for i in range(n_categories)]

fig = plt.figure(figsize=(10,5))
plt.title("Odor: Magnitude of Mean Weights", fontsize=20)
plt.xlabel('Odor type', fontsize=16)
plt.ylabel('Mean of Weights (Maginitude)', fontsize=16)
plt.bar(range(n_categories), mean_abs)

ticks = []
for i in range(n_categories):
    ticks.append(encoders['odor'].inverse_transform([i])[0])

plt.xticks(range(n_categories), ticks, ha='right')
plt.show()

In this weight visualization, we are looking at the odor type. As seen in the graph above, the most impactful odor type on our prediction task was anise (l) and fishy (y). This is also pretty interesting and good for us as the most impactful smelling mushrooms were those that smelled like fennel and those that smell fishy, two scents almost everyone is familiar with!

In [ ]: